File storage and retrieval method

ABSTRACT

A file storage and retrieval method or technique for processing alpha numeric information that has particular advantages when accessing data in a database on a computer. The retrieval technique uses the ASCII values of characters in a search string concatenated together to form a numeric value which serves as the index to the data itself or to the index which holds the computer address of the location of the data. This technique allows data to be accessed with only one access when searching for a word or phrase within a database and lends itself for use on static storage systems of the future as well as on current disk based systems.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.11/378,122 entitled “FILE STORAGE AND RETRIEVAL SYSTEM,” filed Mar. 17,2006, the contents of which are hereby incorporated by reference, andwhich claims priority to provisional application No. 60/775,753 filed onFeb. 21, 2006, the contents of which are hereby incorporated byreference.

1. FIELD OF THE INVENTION

This invention relates to a file storage and retrieval technique forprocessing alphanumeric information that has particular advantages inimplementing database queries.

2. BRIEF DESCRIPTION OF THE PRIOR ART

In the computer arts, data is typically stored in some form ofnon-volatile storage system, such as magnetic disks, in the form of datafiles. These files are subdivided into data records, which are subsetsof the file itself. Processing is done within each file by accessing thedata records. Users conduct transactions against a file, inserting,deleting retrieving and updating data records.

For very large data bases it is extremely inefficient and time-consumingto sequentially search all data records in a file in order to find aparticular record to access, modify or delete, or to locate theappropriate place to add a new record.

A more efficient, but still cumbersome and time consuming, search methodrequires creating a search key for each data record which uniquelyidentifies the record. Each search key is associated with a recordpointer that indicates the location in the computer storage system ofthe data record associated with the search key. A common type of pointeris a relative record number. Through the use of such record pointers,the data records themselves need not be kept in sequential order, butmay be stored in random locations in the computer storage system. Asearch for a particular data record is enhanced by sequentiallysearching a compiled index of such search key records (comprising searchkeys and record pointers), rather than the date records themselves.However, such sequential searching is still relatively slow.

A much more efficient method for such a key index is to create a “tree”structure, rather than a sequential file, for the key records. One suchtree structure is a B-Tree, an example of which is shown in FIG. 1. Theuse of B-Trees to structure indexes for data files in computer storagesystems is well known in the prior art. (See, for example, Knuth. TheArt of Computer Programming, Volume 3, pages 473-479).

A B-Tree consists of nodes which can be either a root node, branch nodesor leaf nodes. A branch node contains-at least one search key andrelated pointers (such as relative addresses, node numbers) to othernodes. A leaf node contains at least one search key and a pointer to adata record. One node in the tree is the root node or starting point,which can be either a leaf node (only for a tree with a single node) ora branch node. The “height” of a tree is equivalent to the number ofnodes traversed from the root node to a leaf node. Searching for a datarecord is accomplished by comparing a key to the contents of the rootnode, branching to branch nodes based on such comparisons, comparing thekey to the contents of such branch nodes, and continuing “down” theheight of the tree until a leaf node is reached. The key is compared tothe contents of the leaf node, and one of the pointers in the leaf nodeis used to locate the desired data record (if one exists).

In the most simple B-Tree, see FIG. 2, each node contains one search keyand two associated pointers. Such a tree structure, sometimes referredto as a binary tree, theoretically provides a very efficient searchmethod. If the number of nodes in this type of tree is equal to or lessthan 2^(n), then only “n” comparisons are required to locate a datarecord pointer in any leaf node.

In practice, a simple binary tree is inefficient. Most data bases arestored on relatively slow storage systems, such as magnetic disks. Thetime required to access any item of data (such as a tree node) on such astorage device is dominated by the “seek” time required for the storageunit to physically locate the desired storage address. Following eachseek, the contents of a node may be read into the high-speed memory ofthe computer system. In a simple binary tree, for each access of a node,only a two-way decision (to the left or right branch from that node) canbe made since the node contains only one search key. If instead ofcontaining only one search per node, a node contains several searchkeys, then for each seek operation, several keys will be read into thehigh speed memory of the computer system. With one search key per node,a comparison and determination can be made that the item sought for isone half the remainder of the tree. With “n−1” search keys per node, thesearch can be narrowed to “1/n” of the remainder of the tree (forexample, with 9 search keys per node, a search can be narrowed to “1110”of the remainder of the tree). This type of structure is known in theprior art as a “multi-way” tree. See FIG. 3.

It is advantageous to have as many search keys as possible per node.Thus, for each seek of a node, several search keys can be examined and amore efficient determination can be made as to the location of the nextnode or, in the case of a leaf node, of a data record. The height of thetree, and hence the search time, is dramatically decreased if the numberof search keys per node is increased.

An even later development in tree structures is the B+-Tree which usesquery values in place of actual search key values in the branch nodesand places all key values in leaf nodes, as shown in FIG. 4. (See, forexample, D. Comer 1979, “The Ubiquitous B-Tree,” ACM Computing SurveysVol. 11, No. 2, June, 1979, pgs. 130-131). This structure has all of theadvantages of B-Trees as well as having a much smaller index foraccessing for keys. B+-Trees use multiple query values in branch nodes,as shown in FIG. 5 and FIG. 7 as well as multiple keys per leaf node, asshown in FIGS. 6 and 7.

The state of the art appears to be in some similar form of the B-Treeand the B+-Tree as shown here as prior art.

The process of actually building a tree structure using any of thecurrent methods results in branches becoming unbalanced based on theorder in which keys are added and/or deleted from the data base. Whenthe branches are out of balance more than 2^(n)+1 comparisons may berequired when searching for one particular key. These structures must bebalanced regularly and never remain in a totally balanced condition.B+-Trees additionally may require new query values to be establishedwhile being balanced.

The present invention solves the problem of accessing data in adatabase. Since all words and/or selected phrases in the database arestored as concatenated strings of their ASCII characters, any word orphrase can be found with only one access. This requires systems to havevery large indexes within a database which will be a result of computersystems in the very near future. This process may also be scaled backfor use on today's system.

SUMMARY OF THE INVENTION

The inventive file storage and retrieval method uses the ASCII values ofthe characters in the search string concatenated together to form anumeric value which serves as the index to the data itself, see FIG. 8,or as the index which holds the address of the location of the data inthe system. See FIG. 9. When a database is created these ASCII characterstring indexes are written into the file to allow quick access whenretrieving information. This means that any character string written toa file can be retrieved directly in just one access.

BRIEF DESCRIPTION OF THE DRAWINGS

The features of the present invention which are believed to be novel areset forth with particularity in the appended claims. The presentinvention, both as to its organization and manner of operation, togetherwith further objects and advantages thereof, may best be understood withreference to the following description, taken in connection with theaccompanying drawings in which:

FIG. 1 is a schematic diagram of the B-Tree Index Structure for indexingdata records in a computer storage system. This diagram shows the basicelements of tree structures.

FIG. 2 is a schematic diagram of a B-Tree Index Structure, Binary Tree,w/1 Key and 2 Branches/Node.

FIG. 3 is a schematic diagram of a B-Tree Index Structure, Multi-WayTree, w/2 Keys and 3 Branches/Node.

FIG. 4 is a schematic diagram of a B+-Tree Index Structure W/1 QueryValue and 2 Branches/Node and 1 Key/Leaf Node.

FIG. 5 is a schematic diagram of a B+-Tree Index Structure w/2 QueryValues and 3 Branches/Node and 1 Key/Leaf Node.

FIG. 6 is a schematic diagram of a B+-Tree Index Structure w/1 QueryValue and 2 Branches/Node and 3 Keys/Leaf Node.

FIG. 7 is a schematic diagram of a B+-Tree Index Structure w/1 QueryValues and 3 Branches/Node and 3 Keys/Leaf Node.

FIG. 8 is a schematic diagram of concatenated ASCII String Values usedas an address with the data stored at the same location.

FIG. 9 is a schematic diagram of concatenated ASCII String Values usedas an indirect address which holds the address of the stored data.

FIG. 10 is a schematic diagram showing concatenated ASCII values of astring used as an index for storing associated data. The date is storedat the same index.

FIG. 11 is a schematic diagram showing concatenated ASCII values of astring used as an index for storing associated data. The associated datais stored at another indirect address.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Throughout this description, the preferred embodiment and examples shownshould be considered as exemplars, rather than limitations on the methodof the present invention.

The inventive file storage and retrieval technique is primarily designedfor use with large data files running on static (non-movable) mediasystems of the future. This creative idea, however, lends itself to usein current day systems as well.

This inventive Structure is best illustrated through an example asfollows:

Any indexing scheme may be used to create this database. When a stringof characters is used as a key to the database the prior and nextpointers have already been established and are stored as data in thedatabase record. The concatenated ASCII values of the string may be usedas an index to store the associated data. This data is then written tothe database as shown in FIG. 10. In this example the data is written atthe index constituting the concatenated ASCII values of the charactersin the string.

Data may also be written to the database as shown in FIG. 11. Thisfigure shows an example of storing the data related to the stringcharacters at an indirect address which is stored at the indexconstituting the concatenated ASCII values of the string characters.

The indexes created by concatenating the ASCII values of the stringcharacters may become very large but systems of the future should easilybe able to accommodate addresses well over ten to the 100th power.

1. (canceled)
 2. (canceled)
 3. A file storage and retrieval method foraccessing data in a database on a computer comprising the steps of: in asearch string comprised of alphanumeric characters, using an orderednumeric value for each of said alphanumeric characters; concatenatingtogether said ordered numeric value of each of said alphanumericcharacters beginning with an ordered numeric value of a first characterin said search string and continuing to concatenate ordered numericvalues of said alphanumeric characters sequentially through all saidalphanumeric characters in said search string to a last character ofsaid search string to form a numeric index value that serves as anindex; wherein at said index either an address or related informationresides; wherein said related information is selected from a groupconsisting of said search string, pointers to a prior index, pointers toa next index, pointers to a prior address, pointers to a next address,and any remaining data; wherein said related information is stored atsaid index or wherein said numeric index value serves as an index tosaid address where said related information is stored; wherein saidrelated information or said address may be accessed by only one access.4. A method of accessing data in a computer database comprising thesteps of: in a word or phrase comprised of characters, substitutingordered numeric values for said characters of said word or phrase;concatenating said ordered numeric values of said characters of saidword or phrase beginning with an ordered numeric value of a firstcharacter in said word or phrase and continuing to concatenate anordered numeric value of each of said characters sequentially throughall said characters in said word or phrase to a last character of saidword or phrase to create an index value that serves as an index; whereinat said index either an address or related information resides; whereinsaid related information is selected from a group consisting of saidword or phrase, pointers to a prior index, pointers to a next index,pointers to a prior address, pointers to a next address, and anyremaining data; finding said word or phrase stored in a computer withonly one access using said index of said database, wherein said word orphrase and said related information is stored at said index in just oneaccess or finding said address where said word or phrase and saidrelated information is stored in said computer in just one access.
 5. Acomputer database file storage and retrieval method comprising the stepsof: in a search string comprised of characters, using ordered numericvalues for said characters of said search string; concatenating saidordered numeric values of said characters of said search string togetherbeginning with an ordered numeric value of a first character in saidsearch string and continuing to concatenate an ordered numeric value ofeach of said characters sequentially through all of said characters insaid search string to a last character of said search string to form anumeric value that serves as an index; wherein at said index either anaddress or related information resides; wherein said related informationis selected from a group consisting of said search string, pointers to aprior index, pointers to a next index, pointers to a prior address,pointers to a next address, and any remaining data; utilizing saidnumeric value as said index to access said related information stored ina computer at said index or utilizing said numeric value as said indexto access said address of a location that the related information isstored in said computer; wherein said search string and said relatedinformation can be retrieved from the database file in just one accessor wherein the address where the search string and said relatedinformation is stored in said computer can be retrieved from thedatabase file in just one access.
 6. An indexing scheme for a computerdatabase comprising the steps of: in a string of characters, usingordered numeric values of said characters in said string; concatenatingsaid ordered numeric values of said characters beginning with an orderednumeric value of a first character in said string and continuing toconcatenate an ordered numeric value of each of said characterssequentially through all said characters in said string to a lastcharacter of said string for use as an index; wherein at said indexeither an address or related information resides; wherein said relatedinformation is selected from a group consisting of said string of saidcharacters, pointers to a prior index, pointers to a next index,pointers to a prior address, pointers to a next address, and anyremaining data; using said concatenated ordered numeric values of saidcharacters as said index to a database to store in a computer, saidrelated information at one of said index or at said address, or usingsaid concatenated ordered numeric values to retrieve the relatedinformation from said computer wherein said string of said characterscan be retrieved from said index in just one access or wherein theaddress of the related information stored in the computer can beretrieved from the database in just one access.